Cpu optimzed kernel #10

jiqing-feng · 2025-11-20T06:49:21Z

Introduce BRGEMM to accelerate TTFT up to 10x, speed-up increase with input length.
Make command:
python -c "import torch; print(torch.utils.cmake_prefix_path)"
output be like: /opt/venv/lib/python3.12/site-packages/torch/share/cmake
Then cmake -DCOMPUTE_BACKEND=cpu -DCMAKE_PREFIX_PATH=/opt/venv/lib/python3.12/site-packages/torch/share/cmake -S . && make

Signed-off-by: jiqing-feng <[email protected]>

jiqing-feng · 2025-11-20T07:09:59Z

This kernel only relies on pytoch, which is definitely needed for BNB.

Signed-off-by: jiqing-feng <[email protected]>

yao-matrix · 2025-11-21T17:56:20Z

I don't think libtorch is a problem, the concern should be on ABI compatibility, which means you build in version x, but what happens when it runs w/ version y.

jiqing-feng · 2025-11-24T06:05:06Z

I don't think libtorch is a problem, the concern should be on ABI compatibility, which means you build in version x, but what happens when it runs w/ version y.

Yes, the BNB maintainer also raised this point, so he recommended that I put this implementation in kernel-community. We can pull kernels in BNB, it should fix the build and run in different versions issue.

jiqing-feng added 2 commits November 18, 2025 16:30

enable brgemm

6ecdc91

Signed-off-by: jiqing-feng <[email protected]>

enable brgemm; with bug on dequant out brgemm

09fb24f

Signed-off-by: jiqing-feng <[email protected]>

jiqing-feng added 7 commits November 20, 2025 09:05

test

2c9af49

Signed-off-by: jiqing-feng <[email protected]>

rm at for Btmp

c695956

Signed-off-by: jiqing-feng <[email protected]>

fix Btmp

3fe7317

Signed-off-by: jiqing-feng <[email protected]>

Merge branch 'cpu_fused_kernel' into cpu_optimzed_kernel

c7fcca8

use_brgemm_dequant_out threshold

102b204

Signed-off-by: jiqing-feng <[email protected]>

fix Btmp_start

be012e8

Signed-off-by: jiqing-feng <[email protected]>

fix brgemm def

4abec0c

Signed-off-by: jiqing-feng <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Cpu optimzed kernel #10

Cpu optimzed kernel #10

Uh oh!

jiqing-feng commented Nov 20, 2025 •

edited

Loading

Uh oh!

jiqing-feng commented Nov 20, 2025

Uh oh!

yao-matrix commented Nov 21, 2025

Uh oh!

jiqing-feng commented Nov 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Cpu optimzed kernel #10

Are you sure you want to change the base?

Cpu optimzed kernel #10

Uh oh!

Conversation

jiqing-feng commented Nov 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jiqing-feng commented Nov 20, 2025

Uh oh!

yao-matrix commented Nov 21, 2025

Uh oh!

jiqing-feng commented Nov 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jiqing-feng commented Nov 20, 2025 •

edited

Loading